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ABSTRACT 


Many  algorithms  for  obtaining  global  solutions  to  nonconvex  optimization 
problems  have  been  proposed  in  recent  years.  The  methods  farthest  along 
computationally  are  those  for  separable  problems.  These  use  linear 
programming  codes  to  solve  sequences  of  LP  problems  formed  from  piece-wise 
linear  approximations  to  the  nonlinear  functional  forms.  For  a large  class  of 
optimization  problems,  called  factorable  programming  problems,  it  is  possible 
to  create  equivalent  separable  problems.  This  is  done  at  a cost:  additional 

variables  and  constraints.  In  this  paper  the  procedure  for  creating  the 
equivalent  separable  problems  is  outlined  and  a brief  description  is  given  of 
a global  solution  algorithm  due  to  Falk.  A small  example  is  given  illustra- 
ting the  above  techniques.  The  example  is  also  solved  using  a more  direct 
method.  Application  to  the  solution  of  nonlinear  least  squares  is  illustrated 
with  another  example.  Discussion  of  areas  of  research  for  improving  the 
efficiency  of  this  approach  concludes  the  paper. 
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1 . Global  Solutions  to  Separable  Problems 


Solving  nonconvex  programming  problems,  i.e.,  optimization  problems  where 
local  minimizers  may  not  be  global  minimizers  was  thought  not  long  ago  to  be 
relegated  to  heuristic  algorithms.  (See  [McCormick  1972a]  for  a survey  of 
some  of  these  methods.)  Recent  investigators  (e.g.  [Falk  and  Soland  1969], 
[Soland  1971],  [Falk  1972],  [Beale  and  Tomlin  1970],  [Hoffman  1981],  [Mancini 
and  McCormick  1976],  [Mancini  and  McCormick  1979],  [McCormick  1980])  have 
developed  rigorous  algorithms  for  which  convergence  to  global  minimizers  can 
be  proved.  The  theories  are  well-established  and  some  computational  results 
are  available. 

The  algorithms  which  have  the  most  computational  development  are  those  which 
solve  separable  optimization  problems  using  linear  programming  codes. 
Separable  programming  problems  have  the  following  form: 

minimize  FQ(x) 
x e Rn 

subject  to  Fi(x)  < b^,  i = 1,  ...,  m, 

^j  **  xj  ^ Lj,  j - 1 , n,  (Q) 

where  each  Fi(x)  is  written 
n 

F±(x)  = l Fii(xi),  i = 0,  1,  ...,  m. 
j=l  J 

(A  more  general  formulation  allows  for  equality  constraints:  i.e.  for 

Fj_(x)  = b^,  i = m + 1,  ...,  p.  However,  for  simplicity  in  presentation  only, 
the  inequality  constrained  problem  is  considered  in  this  section.) 
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The  basic  idea  behind  most  of  these  methods  is  to  approximate  each  nonlinear 
functional  by  a piece-wise  linear  functional  and  solve  the  resulting  problem 

by  solving  a finite  sequence  of  linear  programming  problems.  The  computer 
programs  used  to  solve  the  example  of  this  paper  implemented  the  algorithm  of 
Falk  and  is  briefly  described  below.  For  more  details  the  reader  is  referred 
to  [Falk  1972]  or  Appendix  A in  [Grotte  1978]  . 


The  approximating  problem  of  the  original  problem  Q is  obtained  by  replacing 
each  function  F^j  by  a piece-wise  linear  approximation  over  the  interval 


[Jlj,  Lj]  . This  involves  selecting  pj  + 1 grid  points  yjQ>  — , _,jp 


> y™  in 

3 


[£j»  Lj],  where  yj0  = and  yjp  = Lj.  Suppose  Xj*  is  some  point  in 

[Aj,  Lj].  Let  yjk*,  yjk*+1,  0ylc*  be  the  unique  values 

where 

xj*  = 0jk*yjk*  + (1  " °jk*)  y jk*+l » 0 < 0jk*  < 1* 


Then 


Fij  (xj*)  " 9jk*Fij^jk*)  + (1  Gjk*)  Fij(yjk*+1) 


Figure  1 represents  this  type  of  approximation. 


There  are  obviously  many  ways  these  pj  + 1 points  can  be  selected.  The 
computer  program  implementing  this  algorithm  allows  the  user  to  specify  the 
value  p j . It  then  computes  equally  spaced  points. 


Let  Kj  = (0,  1,  ...,  Pj} • The  approximating  problem  can  be  stated  in  terms  of 
the  weights  on  the  grid  points,  the  ©jk.'s: 


2 


X 


M 


Figure  1.  Piecewise  Linear  Approximations 
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n 


min 

0 


I l Gjk  Foj 

j=l  ksKj 


subject  to 


n 


I l 

j=l  keK^ 
J J 


I GjkFij(yjk)  * ^i>  •••>  m> 


1 j — 1 » • • • > n > 


(P) 


>0,  j = l,  ...,n;kekj 


and  (for  j = 1,  . . . , n)  at  most  two  of  the  weights  {Gj^,  kekj)  can  be 
nonzeros.  If  two  are  nonzero,  they  must  correspond  to  adjacent  grid  points. 
This  last  constraint  is  the  adjacency  weighted  restriction  (AWR). 

If  it  were  not  for  the  AWR,  the  approximating  problem  (P)  would  be  a simple 
linear  programming  problem  with  variables  { 0 * * This  restriction  is 

necessary  in  order  that  the  piecewise  linear  approximation  indicated  be 
valid . 

In  [Falk  1972]  a branch  and  bound  algorithm  for  solving  (P)  is  presented.  It 
involves  solving  a sequence  of  linear  programming  problems  without  the 
adjacency  weight  restrictions.  The  details  of  his  algorithm  however,  will  not 
be  pursued  here. 

Most  nonlinear  programming  problems  are,  in  their  simplest  formulation,  not 
separable.  For  a large  class  of  problems,  called  factorable  programming 
problems  it  is  possible  to  create  separable  programs  which  are  "equivalent"  to 
the  original  ones  in  the  sense  that  local  minimizers  are  in  a one-to-one 
correspondence.  This  is  done  at  a cost;  an  increase  in  the  number  of 
variables  and  constraints. 

The  description  of  factorable  optimization  problems  is  in  the  next  section. 
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2 . Factorable  Programming  Problems 


A factorable  programming  problem  is  one  which  can  be  written  in  the  following 
form: 

N-l  N-l  p 

min  fN(x)  = l TN>p[fp(x)]  + £ J %,  p,q[f  q(x)  ] * VN,  q , p [f  p(x>  1 

p=l  p=l  q=l 

subject  to 

a^  < fj(x)  = xj  < bj,  j = 1,  n,  (2.1) 

j“l  j“l  P 

aj  < fj(x)  E l Tj,ptfp(x^  + I l Uj,P,q,  [fqCx>]vj,q,p[fpCx)J  < bj, 
p=l  P=1  q=l 

j = m+1,  N-l  where  some  of  the  aj's  are  possibly  equal  to  -°°,  and  the 

T's,  U's  and  V’s  are  scalar  functions  of  one  variable. 

Most  practical  nonlinear  programming  problems  are  factorable.  The  reader  is 
referred  to  [McCormick  1975J  and  [Ghaemi  and  McCormick  1979]  for  a fuller 
discussion  of  this  subject. 

Consider  the  following  nonlinear  programming  problem: 

min  ERF  (x^+x2)  + sin  (x^) • exp(-.5x2) 

(xl>  x2) 

subject  to 

-(xi+x2)2  < -10  (2.2) 

0 < x^  < 10,  0 < X2  < 10, 

where 

k = 2//ir , 
z 

ERF(z)  = k / exp(-t2)dt 
o 
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Written  as  a factorable  programming  problem  f = f -(x))  this  has  the  form: 

J J 

min  f5  = ERF(f3)  + sin  (f]_)  • exp(-.5f2) 

(xi,X2)  (2.3) 

subject  to 

0 < f]_  = xi  < 10 

0 < f2  = X2  < 10 

— 00  < f 2 5 f]_  + f 2 < + 00 

-«  < f4  = — ( f 3 ) 2 < -10 

The  isovalue  contours  of  the  objective  function  are  plotted  in  Figure  2. 

There  are  two  local  minimizers,  one  at  (10,0)  with  objective  function  value 
.45599,  and  another  at  approximately  ( 3tt / 2 , 0)  with  objective  function  value 
approximately  0.  If  the  inequality  constraint  (x^  + X2)^  > 10  were  removed 
from  the  problem,  (0,0)  would  also  be  a local  minimizer.  The  point  (3tt/2,0) 
is  the  global  minimizer  in  both  cases  with  a function  value  slightly  less  than 
zero . 
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Figure  2.  Isovalue  Contours  of  Two  Variable 
Problem 
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3 . Creating  Equivalent  Separable  Problems 


The  techniques  which  can  be  used  to  convert  most  nonlinear  programming 
problems  into  "equivalent"  separable  programs  are  part  of  the  tradition  of 
optimization.  The  origins  of  these  simple  techniques  are  lost  in  the  past.  A 
description  of  two  of  these  are  contained  in  [McCormick  1972b].  Below  is  a 
summary  statement  of  these  techniques. 

The  method  for  creating  equivalent  separable  problems  has  two  basic  steps 
which  are  used  repeatedly  until  they  can  no  longer  be  applied. 

Step  1.  If  the  optimization  poblem  has  a product  term  of  the  form 

q^(x)  • q2(x),  replace  that  product  by  (zi)2-(z2)^  wherever  it  occurs, 
and  introduce  the  equality  constraints 

q i ( x ) = zi  + Z2 

q2(x)  = z1  - z2. 

The  problem  variables  are  augmented  to  include  z\  and  Z2- 

Step  2.  If  the  optimization  problem  has  a term  of  the  form  T[t(x)]  where  T(t) 
is  a scalar  function  and  t(x)  is  a scalar  function  of  more  than  one 
variable,  replace  T [ t ( X) ] by  T(y)  wherever  it  occurs,  add  the 
equality  constraint  t(x)  = y,  and  augment  the  variable  list  with  y. 

Consider  the  example  given  earlier.  Applying  step  2 for  the  expression 
t(x)  = x^-hc2  yields  the  program 
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(3.1) 


min  ERF(yl)  + sin  (xi)  • exp  (-.5x2) 

(xl,X2,yi) 

subject  to 

-(yi)2  < -io 
xi  + x2  = yi, 

0 < x-^  < 10, 

0 < x2  < 10. 

Application  of  Step  1 to  the  product  term  in  the  objective  function  yields  the 
equivalent  separable  program 

min  ERF  (y^)  + (z^)2  - (Z2)2 

(xl>x2»yi)zl>z2)  (3.2) 

subject  to 

-(yi)2  <-io 
xi  + x2  = yi, 
sin  (xx)  = zi  + z2, 
exp(-.5x2)  = zi  - z2, 

0<x1<10,  0<x2< 10 . 

For  this  simple  problem,  separation  occurred  after  just  two  applications  of 
the  separating  techniques.  Often  the  situation  is  more  complicated. 

The  "equivalence"  of  the  problems  is  shown  in  [McCormick.  1972c]  , where  it  is 
shown  that  local  minimizers  are  in  one-to-one  correspondence.  For  the  purpose 

here  it  is  sufficient  to  emphasize  that  if  (x^  ,X2  ,yi , , Z2)  is  a local 
(global)  minimizer  for  (3.2),  then  (x^,x2)  is  a local  (global)  minimizer  for 
(3.1). 
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Not  all  nonlinear  programming  problems  can  be  separated  by  repeated 
application  of  steps  1 and  2.  In  particular,  if  a program  cannot  be  written 

as  a factorable  optimization  problem  it  probably  cannot  be  separated. 
Certainly,  if  it  can  be  written  in  factorable  form,  separation  is  possible. 

Make  the  correspondence  yj  = Xj,  j = 1,  ...,  m.  Then  using  steps  1 and  2 on 
the  factorable  programming  problem  (2.1)  yields  directly 

N"1  N-l  p N, 1 9 N,2  0 

min  l TN>p(yn)  + l l Uzq,p)2  ~ (Zq,p)2]  (3.3) 

(y,z)  p=l  p=l  q=l 

subject  to  the  inequality  constraints 

aj  < yj  < bj,  j = 1,  ...,  N-l 
and  the  equality  constraints 

yj  = l Tj,P(yp)  + I ? ^Zq,p^2 

p=l  p=l  q=l 


j = n+1,  . . . , N;  (3.6) 

p = 1 , . . . , j— 1 , 
q = 1 , * • • , p. 

rp 

Here  y = (x^,  ...,  xm,  ym4.^,  •••,  y^-i)  , and  z is  the  vector  containing 

j > 1 

the  Zn  n Z _ ’s  appearing  above. 

Application  of  these  equations  to  the  example  problem  given  in  form  (2.3) 
yields  the  separable  problem 


Uj,p,q(yq) 


j,l  3,2 

z + z 
q.p  q,p 


vj»p,q(yp) 


J.1  7j»2 

zq»p  zq,p 


(3.4) 

- (Zq’p)2],  j=m+l , ...,  N-l;  (3.5) 
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(3.8) 


min  ERF(y3)  + (Zi>2)2  “ (zi}2)2 

subject  to 

0 < y^  < 10 
0 < y2  < 10 

-00  < y3  < +» 

-°»<y^<-l-<»  (3.9) 

y3  = yi  + y2> 

Y4  = (y3)2  (3.10) 

5,1  5,2 

sin(yi)  = Z1>2  + zl,2 

5,1  5,2 

exp(-.5y2)  = Z1>2  “ zl,2  , 


where  the  problem  variables  are 


5,1  5,2 

(yi»  y2>  y3»  yu>  zi,2»  zi,2  ) 


Problem  (3.8)  is  similar  to  the  separable  problem  (3.2)  obtained  directly  by 
applying  steps  1 and  2 except  that  it  has  one  variable  more  and  one  more 
equality  constraint.  This  is  an  example  of  a general  modification  which 
should  be  made  to  the  separation  rules  based  to  create  (3.3).  If  some  fj 
(j  = m+1,  ...,  N-l)  does  not  enter  into  the  computation  of  any  future  f-  the 
introduction  of  the  new  variable  yj  should  be  omitted,  and  no  equality 
constraint  of  the  form  (3.5)  should  be  included.  The  simple  bound  in  (3.4)  in 
this  instance  should  remain 


a-j  < 


3-1 

I 

p=l 


Tj,p(yp) 


3-1  P j,l 

+ I I KZq,P)2  ' 

p=l  q=l 


j , 2 „ 

(Zq,p)2)  < b. 
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Using  this  modification,  problem  (3.8)  above  would  be  changed  in  that  would 
no  longer  be  a variable,  (3.10)  would  be  omitted,  and  (3.9)  would  become 
-°°  < ~(y3)2  < -10.  It  is  then  equivalent,  except  for  the  variable  names,  to 
problem  (3.2). 
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4.  Obtaining  Bounds  on  the  New  Variables 


Algorithms  which  obtain  global  solutions  to  separable  optimization  problems 
invariably  require  upper  and  lower  bounds  on  the  variables.  (An  inductive 
process  to  do  this  when  all  original  variables  {xj}  are  bounded  is  given 
below) . 


The  basic  assumption  here  is  that  it  is  possible  to  compute  the  suprema  and 
infima  of  T's,  U's,  and  V's  used  in  defining  the  factorable  programming 
problem  (2.1). 

Define 

£(T,p,j)  = inf  Tjjp(y)  s.t.  ap  < y < bp, 
u(T,p,j)  = sup  Tj}p(y)  s.t.  ap  < y < bp, 

£ (U, p, q, j ) = inf  Uj}P}q(y)  s.t.  aq  < y < bq, 
u(U,p,q,j)  = sup  UjjPjq(y)  s.t.  aq  < y < bq. 

Analagous  notation  is  used  for  the  other  quantities. 

Then  the  lower  bound  on  y^  is  given  by 

yj  > max  (aj,cj) 

where 

j-1  j-1  p 

cj  = l A(T,p,j)  +1  l max  [Z(U,p,q, j)«Z(V,p,q, j),i(U,p,q, j)«u(V,p,q, j), 
p=l  p-1  q=l 

u(U,p,q,j)«A(V,p,q,j),  u(U,p,q,j)»u(V,p,q,j)] 
Analogous  bounds  are  computed  for  the  other  quantities. 
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Consider  the  separable  problem  created  in  section  3.  Since  y3  ^1 
the  bounds  on  both  and  y2  are  0 and  10, 

0 < y3  < 20. 

(Recall  y^  was  eliminated  from  the  problem  as  unnecessary).  Now 

5,1 

Zi,2  = [sin(y1)  + exp(-.5y2) ] /2 

and 

Zl’,2  = [sinCyj^)  - exp(-.5y2)]/2. 

Since 

-1  = inf  sin(y^)  < sinCy^)  < sup  sin(y]_)  = 1, 

where 

0 < y^  < 10 

and 

0 = inf  exp(-.5y2)  < exp(-.5y2)  < sup  exp(-.5y2)  < 1 where  0 < y 
it  follows  that  the  bounds  on  these  new  variables  are 

5.1  , 

-.5  < Zij2  < 1» 

5.2 

-1.  < Z1>2  < -5* 


+ y2  and 


< 10, 
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5 . Computer  Solution  To  Simple  Problem 


The  computer  program  which  implements  Falk's  branch  and  bound  procedure  (see 
section  1)  is  called  NUGLOBAL  and  is  available  from  the  Operations  Research 
Division  of  the  National  Bureau  of  Standards.  It  uses  a linear  programming 
package  called  SEXOP  written  by  Dr.  Roy  E.  Marsten. 

The  user  is  required  to  write  a FORTRAN  subroutine  (GETPHI)  which  supplies  the 
values  of  the  nonlinear  functions  defining  the  separable  optimization  problem. 
Input  to  this  routine  is  the  constraint  index  i,  variable  index  j,  and 
variable  value  Xj.  The  output  is  Fij(xj).  For  the  simple  two  variable 
example  the  SUBROUTINE  GETPHI  is  listed  in  Appendix  A. 

The  user  is  also  required  to  supply  a data  file.  The  one  used  for  this 
example  is  given  in  Appendix  A.  (This  corresponds  to  the  Run  3 to  be 
discussed  next.) 

Computer  instructions  on  how  to  use  the  NUGLOBAL  system  are  in  [Hoffman 
1975]  . 

For  notational  convenience  and  to  conform  to  the  separable  format,  problem 
(3.2)  is  rewritten 

min  ERF(x3)  + (X4)2  - (X5)2 
(xl>  X5) 
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Subject  to: 

X1  + x2  “ x3  = 0 

-sinCx-^)  + X4  + X5  = 0 

-exp(-.5x2)  + x^  - X5  = 0 
- x3  <-10 

0<  xL  < 10, 

0 < x2  < 10, 

0 < x3  < 20, 

-.5  < x^  < 1, 

1 < x5  < .5. 

For  "Run  1"  the  first  variable  was  "divided  into  nine  variables",  i.e. 

Pl=8  (see  Section  10).  That  is,  in  the  linear  programming  problem 
01,0,  •••,  ©1,8  "represented"  *-1 . Variables  x2-x3.x4.x5  were  replaced  by  6 
variables  each.  The  resulting  linear  programming  problems  had  33  0 
variables . 

To  get  the  global  minimizer  to  problem  (P)  required  the  solution  of  fifteen 
linear  programming  problems.  The  solution  obtained  was 

x*  = (5.0,  0.0,  5.0,  0.020538,  -0.97946). 

The  value  of  the  piece-wise  linear  approximation  to  the  objective  function  was 
(.052861).  The  value  of  the  true  objective  function  at  this  point  is 
(0.04107572) . 

The  second  run  ("Run  2")  used  pj  = 10,  j=l,  •••,  5.  This  created  linear 
programming  problems  with  55  0 variables.  This  required  the  solution  of  12 
linear  programming  problems  to  obtain  the  global  solutions  to  Problem  (P) . 
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The  solution  obtained  was  the  same  as  run  1,  x*=(5.0,  0.0,  5.0,  0.020538, 
-0.97946).  The  value  of  the  piece-wise  linear  objective  function  was 

(0.04402).  The  true  objective  function  value  there  is,  as  before,  (0.04107). 

These  two  values  are  closer  than  in  run  1 because  of  the  finer  grid 
approximation  in  Run  2. 

In  Run  3,  pj  = 25,  j = 1,  ...,  5.  Each  Xj  was  represented  by  26  0 variables. 
Each  linear  programming  problem  had  130  variables.  Nine  linear  programs  were 
required  to  find  the  solution  x*  = (4.8,  0.0,  4.8,  0.0019177,  -0.99808).  The 
true  objective  function  value  there  is  (0.00383)  with  piece-wise  linear 
approximate  value  of  (0.00455).  These  results  are  summarized  in  Table  1. 


# of  LP 
Vars. 

LPS 

Solved 

X1 

x2 

x3 

x4 

x5 

Run  1 

33 

15 

5.0 

0.0 

5.0 

.0205 

-0.9795 

Run  2 

55 

12 

5.0 

0.0 

5.0 

1 0205 

-0.9795 

Run  3 

130 

9 

4.8 

0.0 

4.8 

.0019 

-0.9981 

Theoretical 

Solution 

4.71 
=3tt  / 2 

0.0 

4.71 

0.0 

-1.0 

Table  1 - Computer  Effort  Required  to  Solve  LP  Approximating  Problems 
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6 . Direct  Solution  of  Simple  Problem 


Another  approach  for  finding  global  minimizers  to  nonconvex  programming 
problems  (see  [McCormick  1976]  for  more  details)  is  to  solve  a sequence  of 
"convex  underestimating  problems."  Convex  programming  problems  are  problems 
which  can  be  written: 


where  the  functions 


are  convex  functions. 


minimize  f(x) 
subject  to 

gj_(x)  >0,  i = 1,  . • • , m 
m 

f,  {-gi>i=l 


The  nice  property  of  convex  programming  problems  is  that  local  minimizers  are 
global  minimizers.  The  isovalue  contours  resemble  ellipses,  parabolas,  or 
lines.  It  is  clear  from  Figure  2 that  the  objective  function  of  problem  (2.2) 
is  not  a convex  function. 

The  constraints  of  a convex  programming  problem  define  a convex  set  for  the 
feasible  region.  For  this  simple  problem,  the  constraint  (x^  + X2)^  > 10  does 
not  define  a convex  set.  However,  the  intersection  of  the  set  of  points 
satisfying  this  with  the  feasible  region  is  a convex  set. 

A convex  underestimating  problem  for  a general  problem  is  one  where  the 
objective  function  is  convex  and  underestimates  the  given  objective  function 
in  the  feasible  region  and  where  the  constraint  region  is  a convex  set  and 
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contains  the  original  feasible  region.  Presented  in  [McCormick  1976]  is  an 


algorithm  for  finding  a convex  underestimating  problem  when  the  original 
optimization  problem  is  factorable  with  bounded  variables. 

The  key  to  implementing  this  approach  is  the  ability  to  compute,  for  a 
function  of  a single  variable  in  an  interval,  its  convex  envelope.  This  is 
the  highest  convex  function  which  underestimates  the  function  in  the  interval. 
Thus,  for  example,  the  convex  envelope  of  sin(x^)  where  0<  x^  < 10  is 


This  is  shown  in  Figure  3,  where  the  notation  vex  f(»)  is  used.  Proper 
notation  would  indicate  the  interval.  Thus  vexsin  [xj_,  0,  10]  should  be  used, 
but  for  notational  simplicity  this  is  not  done. 


-0.21723  xi,  0 < xi  < 4.49342 

vexsin(x1)  = sin^),  4.49342  < xi  < 4.79946 

(-1.41353  +0 .08695xl) , 4.79945  < xi  < 10 


-.21723x 


0 < x < 4.493719 


Sin(x) 


4.493419  < x < 4.7994496 


-1.4135281 


4.7994496  < x < 10. 


+ .0869507x 
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-0.217x  0 < x<  4.493 

Sin(x)  4.493<  x<4.799 


Figure  3.  Convex  Envelope  of  Sine  Function 
[0,  10] 
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Vexsin  (x) 


Using  the  general  techniques  in  the  above  referenced  article,  a convex 
underestimating  function  for  the  objective  function  of  example  problem  (2.2) 


will  be  constructed. 


Since  0 < x^  < 10  and  0 < x2  < 10,  then  0 < x^  + x2  < 20.  Consider 


X!+x2 

<KX1  + x2)  = k-  / exp(-t2)dt 
o 


in  this  interval.  It  is  clear  from  Figure  4,  that  .05  (x^  + x2)  is  a "high 
convex  underestimating  function  in  this  interval  to  a close 
approximation. 
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Figure  4.  A Convex  Underestimating  Function 
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(Zx  + lx) 


Since  0 < xl  < 10,  then  -1  < sin  (xi)  <1.  Since  0 < X2  < 10,  then 


Thus , 


or 


cx  = exp(-5)  < (-.5x2)  < 1. 
[sin(xi)  + 1]  [exp(-.5x2)  — c i ] < 0, 


where 


sin(xj^)  • exp(-.5x2)  > -exp(-.5x2)  + cx  sin(x^)  + cx 

> “1  + c2x2  + C1  [vexsin  (x^)]  + C]_  (6.2) 


c2  = (-cx  + 1)/10  = .0993262. 

Note  that  -1  + .099326x2  is  the  convex  envelope  of  -exp(-.5x2)  in  the 
interval  [0,  10] . 

Also , 

[sin(c1)  - 1]  [exp (-.5x2)  - 1]  > 0, 


or 


sin(x^)  • exp(-.5x2)  > exp(-.5x2)  + sin(xx)  - 1 

> exp(-.5x2)  + vexsin  (xx)  -1.  (6.3) 


Since  the  maximum  of  two  convex  functions  is  convex,  a convex  underestimating 

function  for  the  product  takes  the  form  (combining  (6.2)  and  (6.3)) 

sin(xx)» exp(-.5x2)  > max[cx-l  + C2X2  + cxvexsin(xx) , exp(-5x2)  + vexsin(xi)-l] . 

(6.4) 


Using  this  with  (6.1)  gives  the  full  convex  underestimating  function: 

Xl4x2 

k J exp(-t^)dt  + sin(x^)  • exp(-.5x2)  (6.5) 

o 

> .05(x^  + X2)  + max[cx~l  + C2X2  + cxvexsin(xx) , exp(-.5x2)  + vexsin(xx)~l J • 
Isovalue  contours  of  this  function  are  plotted  in  Figure  5. 
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The  global  minimizer  for  function  (6.5)  in  the  rectangle  0 < xl  * 10 » 

0 < x2  < 10  is  (4.6623681,  0.).  The  first  component  is  a solution  of  the 

equation  arccos  (x^)  = -.05.  This  is  to  be  compared  with  the  true  solution  of 
the  original  problem  (4.712389,  0.). 


Development  of  the  convex  underestimating  function  above  took  no  advantage  of 
the  restriction  that 

-(xx  + x2)2  < -10 

must  hold.  A convex  underestimating  function  of  -(x^  + x2)2  in  the  rectangle 

0 < x±  < 10,  i = 1,  2 
is 


-20  (x^  + x2)  < ~(xi  + x2)^. 

Thus  the  constraint  can  be  imposed 

-20  (xL  + x2)  < -10, 
or 


xl  + X2  > *5. 


The  convex  underestimating  function  of  the  objective  function  of  (2.2)  can 
now  be  restricted  to  the  region  where  .5  < (x^  + x2)  < 20.  This  gives  the 
tighter  convex  underestimating  function 

.024593(x1  + x 2 ) + .5091315. 

Using  this  with  (6.4)  gives  a tighter  convex  underestimating  function  and  the 
resulting  solution  of  this  problem  is  (4.687793,  0.).  The  first  component  is 
a solution  of  arccos  (x^)  = -.024593. 
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Figure  5.  Isovalue  Contours  of  Convex 
Underestimating  Function 
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min 


7 . Application  To  Nonlinear  Least  Squares 


An  important  source  of  nonlinear  programming  problems  is  the  area  of  parameter 
estimation.  In  particular  the  finding  of  the  "best”  parameter  values  which 
define  a functional  form  often  gives  rise  to  an  unconstrained,  nonconvex 
optimization  problem. 

\ 

Let  x be  a vector  of  "independent”  variables,  and  y a "dependent”  variable 
which  is  thought  to  be  related  to  x via  the  equation 

y = f (x,E), 

where  E is  a vector  of  parameters  and  f is  a predetermined  functional  form. 

Let  {y^},  i = 1,  ...»  r be  observations  of  y simultaneous  with  observations 
{x-^}  , i = 1,  . . . , r of  the  independent  variables.  There  is  a body  of  theory 
(See  [Goldfeld  and  Quandt  1972J)  which  states,  under  the  appropriate 
assumptions,  that  the  "best"  values  of  (E j } are  those  which  solve  the  problem 

r 

minimize  J [yt  - f(X;L,E)]2. 

E i=l 

In  the  reference  above  is  also  a theory  of  the  probability  distribution  of  the 
point  estimate  given  by  the  above  problem. 

When  f is  linear  in  E,  this  is  the  well-known  linear  least  squares  problem  and 
the  function  minimized  is  a positive  (semi)  definite  quadratic  form.  Local 
minimizers  of  this  are  global  minimizers. 
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When  f is  nonlinear  in  the  parameters  {Ej}  t^ie  problem  is  much  more 


complicated  and  the  possibility  that  local,  but  not  truly  global  solutions  of 
that  problem  exist  must  be  considered.  When  the  Ej's  are  physical  constants 
to  be  estimated,  as  in  many  scientific  applications,  this  difficulty  has 
serious  implications. 

In  this  section  a small  parameter  estimation  problem  is  considered.  The 
general  separation  method  given  in  Section  3 plus  some  ati  hoc  techniques  are 
used  to  create  an  equivalent  separable  problem.  This  is  in  turn  solved  by  the 
solution  algorithm  of  Section  1. 

It  should  be  noted  that  the  number  of  variables  in  the  separated  problem  is 
greater  than  or  equal  to  the  number  of  parameters  to  be  estimated  plus  the 
number  of  observations  (r).  If  the  function  f is  complicated,  more  separation 
is  required  and  this  can  give  rise  to  a separable  problem  in  a large  number  of 
variables  with  an  accompanying  large  linear  approximation  problem. 

In  Figure  6 is  plotted  the  fraction  of  college  men  married  versus  age.  The 
data  for  the  problem  is  in  Table  2.  It  is  thought  that  the  dependent  variable 
y (fraction  of  college  men  married)  is  related  to  the  independent  variable  x 
(age  in  years)  by  the  functional  equation 

y = p$[(x-y)/o] 

where 

y 

<Kz)  = J <J>(t)dt, 

— oo 

and 

4>(t)  = (2ir)-*5  exp  (-t2/2). 

(The  normal  density  function.) 
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The  parameters,  whose  best  values  are  to  be  estimated  from  the  data  of  Table  2 
are  p (the  ultimate  fraction  of  these  married),  y (the  mean  of  the 

distribution)  and  a (the  standard  deviation). 


i 

i 

2 

3 

4 

5 

6 

7 

xi 

14.6 

16.8 

18.7 

20.6 

23.1 

27.1 

32.0 

y± 

0.000 

0.004 

0.015 

0.075 

0.315 

0.571 

0.737 

Table  2 

Data  for  Example 
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Figure  6 


Plotted  Data  for  Example 


The  optimization  problem  to  be  solved  is 


7 

min  l {yi”p$([xi  - y]/a)}2.  (7.1) 

(p,6,p)  i=l 


As  a first  step  in  converting  this  to  a separable  problem  set 

v±  = p$  [(x±  - y)/a],  i=l , ...»  7,  (7.2) 

and 

z±  = (xi  - y)/o  , i=l , ...,  7.  (7.3) 

A trick  not  mentioned  in  section  3 but  one  which  is  often  useful  is  to 
transform  a product  into  a sum  by  taking  logarithms.  Doing  this  to  (7.2) 
yields 


By  setting 


In  v^  = In  p + In  $(zi),  i=l,  •••,  7. 


y = w ]+w  2 > 

1 /o  = OJ ]_  “ m2, 

the  original  nonseparable  problem  becomes 


7 

minimize  £ (y^  - v^)2 
i=l 

subject  to 

ln  p - In  vi  + In  <I>(zi)=0,  i=l , ...,  7 
xi(a“^)-zi  ~(o)^)^  + (012)^  = 0,  i=l,  ...,  7 
y - a)  a)  2 ~ 0 

]_+o3  2 = ® 

where  the  minimization  is  performed  over  (y , 1/a,  ln  p,  {v^} , {z^},  u)^,  0)2)* 
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Note  that  the  separated  problem  contains  1/a  and  In  p instead  of  a and  p. 

This  was  done  because  the  new  formulation  is  equivalent  and  is  linear  in  the 

transformed  parameters  (which  are  variables  of  the  optimization  problem). 

From  the  physical  model  it  is  obvious  that  the  parameters  are  constrained  as 
p>0,  a> 0,  CKp  <1.  If  the  data  is  consistent  with  the  model  it  is  expected 

that  these  constraints  are  implicit  and  will  not  be  binding  at  the  solution. 

One  requirement  of  the  global  minimization  technique  outlined  in  Section  1 is 
that  lower  and  upper  bounds  are  required  on  the  problem  variables.  It  should 
be  pointed  out  that  this  can  be  an  advantage  since  often,  from  knowledge  of 
the  problem,  good  bounds  are  available.  This  can  reduce  the  region  in  which 
optimization  takes  place  and  thus  cut  down  on  the  amount  of  computer  effort 
required  to  solve  it. 

From  the  data,  obvious  bounds  on  the  original  three  parameters  are: 

18<|j<30,  l<a<10,  and  .5<p<l.  The  techniques  of  Section  4 are  then  used  to 
place  bounds  on  the  new  variables  of  the  separated  problem.  The  names  of  the 
variables  of  the  separated  problem,  their  xj  indices  and  their  upper  and  lower 
bounds  are  given  in  Table  3. 


31 


ri, 

nd 

j 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 


Variable 

Name 

Lower 

Bound 

Upper 

Bound 

Cuts 
Run  1 

Cuts 
Run  2 

y 

18. 

30. 

1 

1 

1/a 

.1 

1. 

1 

1 

In  p 

-.693 

0. 

1 

1 

V1 

.0006 

.37 

5 

5 

v2 

.0006 

.45 

5 

5 

v3 

.0006 

.76 

5 

5 

v4 

.0006 

1. 

5 

5 

v5 

.0006 

1. 

5 

5 

v6 

.0009 

1. 

5 

5 

v7 

.280 

1. 

5 

5 

Z1 

(-3.) 

-.34 

3 

5 

z2 

-3. 

-.12 

3 

5 

z3 

-3. 

0.7 

3 

5 

z4 

-3. 

2.6 

3 

5 

z5 

-3. 

3. 

3 

5 

z6 

-2.9 

3. 

3 

5 

z7 

.2 

3. 

3 

5 

m 

8.0 

16. 

5 

5 

0) 

8.0 

16. 

5 

5 

Table  3:  Data  for  Least  Squares  Problem  in  Separable  Form 
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A modified  Newton  method  was  applied  directly  to  problem  (7.1)  and  gave  the 


solution 

y = 24.4,  a = 3.28,  p = 0.73. 

Two  computer  runs  were  made  on  the  separated  problem.  The  data  files  and 
subroutine  are  in  Appendix  B. 

The  number  of  cuts  for  each  of  the  nineteen  variables  are  given  in  Table  3. 

The  number  of  0-variables  in  the  linear  programming  problems  associated  with 
each  of  the  variables  in  the  separated  problem  is  equal  to  the  number  of  cuts 
plus  one.  Thus  in  Run  1 each  linear  programming  problem  had  88  variables,  and 
in  Run  2 each  had  102  variables.  The  lower  bound  for  in  Run  1 was  -3  and 
it  was  -5  in  Run  2. 

For  Run  1 the  approximate  solution  was  obtained  after  the  solution  of  41 
linear  programming  poblems  and  was 

y = 22.19,  a = 4.28,  and  p = .57. 

For  Run  2 the  approximate  solution  was  obtained  after  the  solution  of  62 
linear  programming  problems  with  values 

y = 22.54,  a = 3.31,  and  p = .72. 

These  results  are  summarized  in  Table  4. 
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Method 

U 

a 

P 

# of 

Variables 

MP  Problems 
Solved 

Newton 

24.14 

3.28 

0.73 

3 

l(NLP) 

Run  1 

22.19 

4.28 

0.57 

88 

4 1 ( LP ) 

Run  2 

22.54 

3.31 

0.72 

102 

62  (LP) 

Table  4:  Data  Relating  to  Solution  of 

of  Least  Squares  Problem 
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8.  Discussion  of  Results  and  Areas  of  Future  Research 


In  this  paper  it  was  shown  how  general  nonlinear  programming  problems  can 
automatically  be  converted  to  equivalent  separable  problems  when  they  consist 
of  factorable  functions.  Global  solutions  to  linear  approximations  of  these 
separable  programs  are  then  obtained  using  a branch  and  bound  technique  with 
linear  programming  subproblems.  Two  small  examples  were  solved  this  way 
illustrating  the  feasibility  of  the  approach. 

First,  experiments  should  be  made  on  the  placing  of  the  points  which  determine 
the  piece-wise  linear  approximations  to  the  nonlinear  functions  of  a single 
variable.  In  solving  example  (3.1),  refinement  of  the  grid  from  Run  1 to  Run 
2 did  not  result  in  any  improvement  in  the  solution  approximation  (see  Table 
1).  The  final  answer  obtained  (Run  3)  only  agreed  with  the  correct  solution 
to  one  significant  place  even  though  the  number  of  0 variables  approximating 
the  nonlinear  functions  was  increased  significantly.  The  reason  for  this  was 
that  the  grid  refinement  strategy  used  allowed  only  for  equally  spaced 
intervals.  Judicious  placement  should  have  as  a result  better  accuracy  and 
fewer  O-variables. 

Recent  efforts  in  this  area  are  encouraging.  In  [Grotzinger  1981]  problem 
(1.2)  was  solved  to  the  accuracy  (4.20,0.)  requiring  11  linear  programming 
problems  none  of  which  had  more  than  25  O-variables. 
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The  second  area  of  improvement  is  to  lower  the  effort  required  to  solve  the 
linear  programming  problems  by  using  the  inverse  matrix  (basis)  information 

available  from  similar  LP's  solved  by  the  branch  and  bound  procedure.  This 
should  considerably  reduce  the  computation  time.  No  advantage  of  available 
partial  basis  information  was  used  in  solving  the  problems. 

Advantage  can  also  be  taken  of  this  information  when  new  variables  are  added 
to  improve  the  linear  approximation. 

Until  efficient  nonlinear  programming  codes  are  available,  using  linear 
programming  codes  as  subroutines  to  solve  general  nonlinear  programming 
problems  seems  a reasonable  approach  and  one  which  could  be  implemented  on  a 
production  basis  without  much  additional  effort. 
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